26 research outputs found
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Indoor visible light communication localization system utilizing received signal strength indication technique and trilateration method
Visible light communication (VLC) based on light-emitting diodes (LEDs) technology not only provides higher data rate for indoor wireless communications and offering room illumination but also has the potential for indoor localization. VLC-based indoor positioning using the received optical power levels from emitting LEDs is investigated. We consider both scenarios of line-of-sight (LOS) and LOS with non-LOS (LOSNLOS) positioning. The performance of the proposed system is evaluated under both noisy and noiseless channel as is the impact of different location codes on positioning error. The analytical model of the system with noise and the corresponding numerical evaluation for a range of signal-to-noise ratio (SNR) are presented. The results show that an accuracy of 12 dB
Gait recognition for person re-identification
Person re-identification across multiple cameras is an essential task in computer vision applications, particularly tracking the same person in different scenes. Gait recognition, which is the recognition based on the walking style, is mostly used for this purpose due to that human gait has unique characteristics that allow recognizing a person from a distance. However, human recognition via gait technique could be limited with the position of captured images or videos. Hence, this paper proposes a gait recognition approach for person re-identification. The proposed approach starts with estimating the angle of the gait first, and this is then followed with the recognition process, which is performed using convolutional neural networks. Herein, multitask convolutional neural network models and extracted gait energy images (GEIs) are used to estimate the angle and recognize the gait. GEIs are extracted by first detecting the moving objects, using background subtraction techniques. Training and testing phases are applied to the following three recognized datasets: CASIA-(B), OU-ISIR, and OU-MVLP. The proposed method is evaluated for background modeling using the Scene Background Modeling and Initialization (SBI) dataset. The proposed gait recognition method showed an accuracy of more than 98% for almost all datasets. Results of the proposed approach showed higher accuracy compared to obtained results of other methods result for CASIA-(B) and OU-MVLP and form the best results for the OU-ISIR dataset
3D objects and scenes classification, recognition, segmentation, and reconstruction using 3D point cloud data: A review
Three-dimensional (3D) point cloud analysis has become one of the attractive
subjects in realistic imaging and machine visions due to its simplicity,
flexibility and powerful capacity of visualization. Actually, the
representation of scenes and buildings using 3D shapes and formats leveraged
many applications among which automatic driving, scenes and objects
reconstruction, etc. Nevertheless, working with this emerging type of data has
been a challenging task for objects representation, scenes recognition,
segmentation, and reconstruction. In this regard, a significant effort has
recently been devoted to developing novel strategies, using different
techniques such as deep learning models. To that end, we present in this paper
a comprehensive review of existing tasks on 3D point cloud: a well-defined
taxonomy of existing techniques is performed based on the nature of the adopted
algorithms, application scenarios, and main objectives. Various tasks performed
on 3D point could data are investigated, including objects and scenes
detection, recognition, segmentation and reconstruction. In addition, we
introduce a list of used datasets, we discuss respective evaluation metrics and
we compare the performance of existing solutions to better inform the
state-of-the-art and identify their limitations and strengths. Lastly, we
elaborate on current challenges facing the subject of technology and future
trends attracting considerable interest, which could be a starting point for
upcoming research studie
PRNU-Net: a Deep Learning Approach for Source Camera Model Identification based on Videos Taken with Smartphone
Recent advances in digital imaging have meant that every smartphone has a video camera that can record highquality video for free and without restrictions. In addition, rapidly developing Internet technology has contributed significantly to the widespread distribution of digital video via web-based multimedia systems and mobile applications such as YouTube, Facebook, Twitter, WhatsApp, etc. However, as the recording and distribution of digital video has become affordable nowadays, security issues have become threatening and have spread worldwide. One of the security issues is the identification of source cameras on videos. Generally, two common categories of methods are used in this area, namely Photo Response Non-Uniformity (PRNU) and Machine Learning approaches. To exploit the power of both approaches, this work adds a new PRNU-based layer to a convolutional neural network (CNN) called PRNU-Net. To explore the new layer, the main structure of the CNN is based on the MISLnet, which has been used in several studies to identify the source camera. The experimental results show that the PRNU-Net is more successful than the MISLnet and that the PRNU extracted by the layer from low features, namely edges or textures, is more useful than high and mid-level features, namely parts and objects, in classifying source camera models. On average, the network improves theresults in a new database by about 4
A combined multiple action recognition and summarization for surveillance video sequences
Human action recognition and video summarization represent challenging tasks for several computer vision applications including video surveillance, criminal investigations, and sports applications. For long videos, it is difficult to search within a video for a specific action and/or person. Usually, human action recognition approaches presented in the literature deal with videos that contain only a single person, and they are able to recognize his action. This paper proposes an effective approach to multiple human action detection, recognition, and summarization. The multiple action detection extracts human bodies’ silhouette, then generates a specific sequence for each one of them using motion detection and tracking method. Each of the extracted sequences is then divided into shots that represent homogeneous actions in the sequence using the similarity between each pair frames. Using the histogram of the oriented gradient (HOG) of the Temporal Difference Map (TDMap) of the frames of each shot, we recognize the action by performing a comparison between the generated HOG and the existed HOGs in the training phase which represents all the HOGs of many actions using a set of videos for training. Also, using the TDMap images we recognize the action using a proposed CNN model. Action summarization is performed for each detected person. The efficiency of the proposed approach is shown through the obtained results for mainly multi-action detection and recognition
Robust gait recognition: A comprehensive survey
Gait recognition has emerged as an attractive biometric technology for the identification of people by analysing the way they walk. However, one of the main challenges of the technology is to address the effects of inherent various intra-class variations caused by covariate factors such as clothing, carrying conditions, and view angle that adversely affect the recognition performance. The main aim of this survey is to provide a comprehensive overview of existing robust gait recognition methods. This is intended to provide researchers with state of the art approaches in order to help advance the research topic through an understanding of basic taxonomies, comparisons, and summaries of the state-of-the-art performances on several widely used gait recognition datasets.This publication was made possible by NPRP grant # NPRP 8-140-2-065 from the Qatar National Research Fund (a member of Qatar Foundation).Scopu
Face recognition and summarization for surveillance video sequences
Face recognition and video summarization represent chal- lenging tasks for several computer vision applications including video surveil- lance, criminal investigations, and sports applications'. For long videos, it is difficult to search within a video for a specific action and/or person. Usually, human action recognition approaches presented in the literature deal with videos that contain only a single person, and they are able to recognize his action. This paper proposes an effective approach to multiple human action detection, recognition, and summarization. The multiple action detection ex- tracts human bodies' silhouette then generates a specific sequence for each one of them using motion detection and tracking method. Each of the extracted sequences is then divided into shots that represent homogeneous actions in the sequence using the similarity between each pair frames. Using the histogram of the oriented gradient (HOG) of the Temporal Difference Map (TDMap) of the frames of each shot, we recognize the action by performing a comparison be- tween the generated HOG and the existed HOGs in the training phase which represents all the HOGs of many actions using a set of videos for trainin
Automatic Detection and Classification of Audio Events for Road Surveillance Applications
This work investigates the problem of detecting hazardous events on roads by designing an audio surveillance system that automatically detects perilous situations such as car crashes and tire skidding. In recent years, research has shown several visual surveillance systems that have been proposed for road monitoring to detect accidents with an aim to improve safety procedures in emergency cases. However, the visual information alone cannot detect certain events such as car crashes and tire skidding, especially under adverse and visually cluttered weather conditions such as snowfall, rain, and fog. Consequently, the incorporation of microphones and audio event detectors based on audio processing can significantly enhance the detection accuracy of such surveillance systems. This paper proposes to combine time-domain, frequency-domain, and joint time-frequency features extracted from a class of quadratic time-frequency distributions (QTFDs) to detect events on roads through audio analysis and processing. Experiments were carried out using a publicly available dataset. The experimental results conform the effectiveness of the proposed approach for detecting hazardous events on roads as demonstrated by 7% improvement of accuracy rate when compared against methods that use individual temporal and spectral features